Method and apparatus for data storage and retrieval

ABSTRACT

Data storage is performed using the formation of records having a keyfield which is a numeric concatenation of at least two identifiers. Preferably, the keyfield contains an identifier of entity type and an identifier of an attribute and, more preferably, also an identifier of entity--where entity type is generic, such a &#34;company&#34;, entity is specific, such as ABC Limited, and attribute is, for example, &#34;telephone number&#34;. The numeric values are preferably obtained from a list of words and/or phrases for which a numeric value has been pre-assigned for each entry in the list. Preferably, some of the records store data and other of the records store details of the relationship between data. Beneficially, some of the records store data and others of the records control data processing.

FIELD OF THE INVENTION

The present invention provides the basis for the implementation of animproved approach to computing Background Art which is fundamentallydifferent from known concepts and which mitigates many disadvantagesassociated with conventional computing.

The conventional approach to computing systems usually starts with thepreparation of a detailed specification of what is required and how itis to be implemented. This initial stage is often fraught withdifficulties. Often it is not understood what is actually required froma computer system until at least part of the system has been seen inoperation. Additionally, many systems are so large and complex that itis extremely difficult even to produce a system specification which iscompletely internally consistent.

The second stage in building a conventional computer system is to passthe system specification to specialist programmers and futherdifficulties are often encountered at this stage. Apart from potentialproblems in interpreting or simply implementing the requirements set outin the specification, one of the biggest problems facing conventionalcomputing is the generation of documentation explaining what the systemdoes and how it does it and subsequent maintenance of the computerprograms. A problem frequently encountered in the maintenance of thecomputer programs. A problem frequently encountered in the maintenanceof existing systems arises when it is desired to add an additional fieldto an existing record structure. Typically, all programs within thecomputer system which access the file containing the records in questionhave to be amended.

There have been many studies which show that perhaps as much as 80% ofall modern programming effort is spent on maintaining existing systems,thus allowing only 20% of the available programming effort to be spenton developing new applications.

Difficulties arise even at the most basic level. For example, when datais stored on magnetic media, it is conventionally stored either in fixedlength or variable length records. Especially where there is arequirement for the sorting of stored data and/or a desire for fastaccess, the stored data is indexed and fixed length records arepreferred. The index may be stored as part of the original data or maybe stored separately from it. The index is of fixed length. Fixed lengthrecords have the obvious disadvantage that the data must be tailored tofit the chosen length of record.

Very considerable human resources have been expended in devisingcomputer database systems. This expenditure and the popularity of suchsystems attests to the underlying need for such systems in modernsociety. The system requirements become ever more sophisticated and oneaspect of this has become the requirement for recording and processingof many-to-many relationships. Some known systems claim to meet theserequirements. But, as far as is known, they all appear to have certainlimitations and/or very complex processing requirements.

DISCLOSURE OF THE INVENTION

The present invention seeks to mitigate all of the above-mentioneddisadvantages using a conceptual approach which is distinct from thatunderlying conventional computing systems.

According to one aspect of the present invention there is provided amethod of data storage and retrieval comprising the formation of recordshaving a keyfield containing two numbers, one identifying an entity typeand one identifying an attribute of an entity of the identified type.

According to another aspect of the present invention there is provided amethod of data storage and retrieval comprising the formation of recordshaving a keyfield containing three numbers, one identifying an entitytype, one identifying an entity of the identified type and oneidentifying an attribute of the entity.

According to another aspect of the present invention there is provided amethod of data storage and retrieval comprising the formation of recordshaving a keyfield containing numbers derived from a look-up table orfile in which numbers are assigned to words.

According to another aspect of the present invention there is provided amethod of data storage and retrieval comprising the formation of datarecords and records which control the flow of data processing andstorage of said records together in a common file.

According to another aspect of the present invention there is provided amethod of data storage and retrieval comprising the storage of detailsof relationships between data as records having a keyfield containinginformation which enables identification of the keyfield of recordscontaining the data for which details of the relationship are beingstored.

According to another aspect of the present invention there is providedapparatus implementing the method described in any of the five precedingparagraphs.

BRIEF DESCRIPTION OF THE DRAWINGS

An embodiment of the present invention and modifications thereof willnow be described, by way of example only and with reference to theaccompanying drawings; in which:

FIG. 1 illustrates a typical record formed in accordance with anembodiment of the invention,

FIG. 2 illustrates the keyfield of a typical record formed in accordancewith an embodiment of the invention,

FIG. 3 illustrates use of part of the keyfield to store compressedalphanumeric information in digital form,

FIGS. 4a and 4b illustrate examples of actual records formed inaccordance with the described embodiment of the invention,

FIGS. 5a and 5b illustrate semantic networks, depicting the relationshipbetween entity type and attribute in one case and between entity typeand application in the other,

FIG. 6 illustrates a semantic network depicting the relationship betweenapplication and operator, and

FIG. 7 illustrates a semantic network depicting the relationship betweenentity type and thesaurus term.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Firstly, it is necessary to explain the information which is chosen forformation of the keyfield, since this is in itself is an importantaspect of the inventive concept. In the following explanation a certainnomenclature is used for ease of reference. The basis of thisnomenclature is as follows:

When first considering an item of information, one considers the type ofitem in question. That is, one considers the "entity type".

Next one considers the particular item itself. That is, one considersthe "entity".

Subsequently one considers what is known or desired to be known aboutthe entity. That is, one considers the "attributes" of an entity.

A simple example will be given; based on a hypothetical company calledABC Limited.

the entity type is "company"

the entity is "ABC Limited"

the attribute of ABC Limited is it's actual "business address"

another attribute of ABC Limited is it's actual "business telephonenumber".

It is obvious that many different entity types may be recorded in asystem, many entities may exist within an entity type, and manyattributes may exist for any one entity. Entity types may themselveshave attributes, if the entity type is being considered as an entity.For example, the feature "business address" can be considered anattribute of the entity type "company", i.e. all companies have abusiness address.

It is also true that any one entity may belong to several differententity types and that any one attribute may be an attribute of more thanone entity type. For example, consider a company which is both acustomer and a supplier. The entity values of entity types customer andsupplier are both the same, ie. the actual name of the company.Similarly, the attribute values of customer address and supplier addressare both the same. The attribute values of contact name for supplier andcustomer are, however, likely to be different from each other.

Difficulties arise in conventional systems if one attempts to maintainthe complex cross-relationships which can arise where one entity belongsto several different entity types. That is, if one sets out to avoidrecording each relationship separately. Apart from the size of the datastore required, there is the strong desire to avoid storing eachrelationship separately because of the difficulties which otherwisearise when it becomes necessary to update the data. It is preferablethat common data is only stored once. There is then no risk ofdiscrepancies arising between different entries for the same data. Onlyone entry needs to be updated and data entry is reduced. It is howeverdifficult to achieve such benefits with conventional systems.

Consider another simple example:

ABC Limited has ten employees; consider "company" as the entity type,"ABC Limited" as the entity, "employee name" as an attribute descriptionand, for example, "Kevin Smith" as an attribute value. However, thevalue of "employee name" can be considered as an entity in it's ownright (which can be considered as a sub-entity of the entity "ABCLimited"). That is, entity (or sub-entity) "employee name" may have"business telephone number" as an attribute description thereof. If eachrelationship is stored separately, the same telephone number is storedten times, once for each employee. Moreover, if the telephone numberchanges there is a significant risk that not all ten instances of thetelephone number will be updated consistently. Of course, it would bemuch better to store the telephone number only once. However, theattribute value of attribute description "business telephone number" isreally an attribute of the relationship between the company and theemployee. If the relationship is broken, ie., Kevin Smith ceases to bean employee of ABC Limited, then the attribute value (ie., actualtelephone number) is clearly no longer valid. The entity "Kevin Smith"may well take on a new value for attribute description "businesstelephone number", but the relationship with ABC Limited is broken andas a result, clearly, the attribute (business telephone number) of thatrelationship is no longer valid. Removing the relationship also removesthe attribute. It is the storage of data in accordance with theseconcepts which is not achieved by conventional systems. In contrast, thepresent invention is predicated upon data storage in accordance withsuch concepts.

As mentioned above in relation to the value of "employee name", entitiesmay have sub-entities. This should be clear from the describedrecognition that "Kevin Smith" was likely to retain a "businesstelephone number" even though the previous attribute value (i.e., actualnumber) was removed because the attribute had been removed as aconsequence of the relationship (i.e., employment by ABC Limited) beingremoved. The fact that "Kevin Smith" retains an attribute with adescription "business telephone number", indicates that "Kevin Smith" isin infact being considered as a separate entity. This particular entitywas, originally, a sub-entity of the entity "ABC Limited". Thesub-entity "Kevin Smith" will have an attribute with a description of"home address". However, "home address" may itself have an attribute. Anexample might be an attribute with a description of "rateable value".The value of the attribute having a description of "home address" is,thus, itself being considered as an entity. Here we have "Kevin Smith"as a sub-entity of "ABC Limited" and "home address" as a sub-entity of"Kevin Smith". This can be considered as two descending levels ofentity. In the described embodiment of the invention, up to 10⁹ valuesare available to identify separate entity values. By descending onelevel, ie. considering a first level of sub-entities, a further 10⁹values become available. If the highest level of entity values each havea separate sub-entity level associated therewith, it can be seen that10⁹ ×10⁹ values can be stored--and so on, for each sub-level of entity.

Conventionally, one often also has to consider the physical location ofthe records on the magnetic media. This is likely to have been selectedusing a hashing routine whereby the pointer to the record relatesdirectly to the address of the physical location on the magnetic mediawhere the record is stored. Consequently, access to the records in aconventional system will often involve significant movement betweenphysical locations on the magnetic media. This can be avoided using thepresent invention, with consequential improvements in access andprocessing times.

Going back to a general consideration of the information which isactually being stored, and indeed the knowledge associated with orarising from that information; if one simply inspects a record withinthe database, there is no indication of the meaning of the data. Thatis, records do not store the context of the data. In using the system ofthe present invention, one retrieves the data stored in a record byasking questions the answers to which effectively navigate through therecords using the keyfields. Thus, when a certain record is retrieved,the questions or navigation to that record have defined the context inwhich the data is being considered. Since the context is thus defined,it need not be stored in the records themselves. This in strict contrastto conventional data storage systems. Moreover, a point of primesignificance results. Namely, one can change any one individual portionof the database without having to consider the potential consequencesupon the remainder of the system. That is, one only need ensure thateach individual relationship is correctly recorded to ensure theintegrity of the whole system. This is in extremely stark contrast toconventional systems.

In a conventional system, change to any one part of the system must beconsidered in terms of the potential consequential effects upon thewhole of the rest of the system. Often, it is virtually beyond therealistic abilities of the persons responsible to comprehend andadequately deal with all of the consequences arising from a change toone portion of a system. As a minimum, such changes in a conventionalsystem incur an enormous overhead in simply maintaining the system. Thetime and effort spent in maintaining the system can not, of course, bespent where it would be more productive, ie. in extending the system orbuilding new systems.

An important aspect of this feature is that validation takes placewithin the database rather than within a separate program.

Thus, it becomes apparent that what is achieved by the present inventionis effectively storage of the rules and procedures conventionallyencoded in computer programs actually as part of the stored data itself.At the same time, the context of the data is not stored as part of thedata, it arises from the data retrieval process itself.

That is, there is knowledge inherent in the data retrieval exercise.But, the presence of such knowledge, or as a very minimum--use thereof,has not previously been recognised. Moreover, the potential to use thatknowledge so as to build a system which does not require vast effort tobe expended in maintaining the integrity of the system as a whole, hasnot previously been recognised.

The route taken to reach a record can be considered as navigationthrough the stored data. The mechanism by which navigation is achievedis explained hereinafter.

In the system of the present invention, simply by keeping a record ofthe route taken to reach a record enables one to navigate `backwards`through the database. Such ability is often desirable. It is usuallyabsent in conventional systems and when present involves enormouslycomplex rules and procedures to be applied.

In conventional computing, one builds a prototype to demonstrate and/orprove a proposed application. Subsequently, the prototype has to beencoded to precede the full working system. Usually, it is the encodingstage which is the most time consuming and which is the most errorprone. With the present invention, prototyping can be performed on the`live` system (because there are no potential consequences upon theintegrity of the rest of the system) and the prototype is immediatelythe final live version of the desired application.

These consideration, together with the benefits described above withregard to record location (in the ABC Limited employee telephone numberexample), result in remarkable improved efficiency compared withconventional computing techniques. Access times on data retrieval areimproved. Application development times are vastly improved and there islittle point of comparison between the maintenance overheads ofconventional systems and the almost complete absence of maintenance ofthe systems of the present invention.

There are still further advantages which accrue from the presentinvention. One such advantage is that the whole system is controlled bya very small `core` program. The core program can be held in itsentirety in the processing memory of conventional computing hardware(even with many users and with virtual, or paging, hardware systems).The embodiment being described was implemented on an IBM AS400 machine.

Another advantage accruing from the present invention relates to updatesof the core program. If at anytime it is considered desirable to updatethe core program, perhaps to enable some new form of processing to beundertaken in relation to data stored in certain record types, theupdated core program can be implemented with great ease. In generalterms, it would be unusual for an update to the core program to berequired. However, should any such update be required, it can be dealtwith without disturbing the applications rules and procedures. This ispossible because what would conventionally be considered as the"applications programs" are not dealt with as "programs". They do notform part of, or modify in any way, the core program. The equivalent ofthe conventional applications programs are, as previously mentioned,effectively stored according to the present invention as part of thedata itself and/or are partially stored inherently as part of thecontext or navigation involved in retrieving data.

The problem of documentation is largely avoided, because there areeffectively no programs in the conventional sense thereof. Thus, thereis little to actually document as compared with a conventional system.

The system of the present invention enables the implementation of thedefinition of an inverse relationship. This is a particularly beneficialfeature of the present invention.

The embodiment of the invention which is described herein involves, inessence, the use of a single database file in which various types ofrecords are stored. The records are of fixed length and have a fixedlength, relatively short, keyfield. Data is typically stored in terms ofthe relationships which exist between the data. Data processing isachieved using records stored in the database, in stark contrast to theconventional conceptual view and use of programs. Serious disadvantagesinherent in conventional arrangements are mitigated and additionalbenefits can be obtained.

The records stored within the database may, according to the presentinvention have many distinct data structures--for the remainder of thefixed length record after the keyfield. Different structures may existwithin one particular type of record. Thus, it is convenient to considerrecords according to their function. The following are functionallydistinct record types:

relationship records, simple data records, option records, menu records,surrogate records and transaction records. Although functionallydistinct, all of these record types may be stored in a common databasebecause they are all of the same fixed length and all have the samekeyfield structure.

As mentioned, data is typically stored in terms of relationships and thecorresponding records are referred to herein as relationship records.Sometimes there is independent data to be stored and simple data recordsare used for that purpose. An option record is the name which is givento a record which is used to control data processing. The selection ofthe different operations to be undertaken is carried out using so calledmenu records. So called surrogate records are used to storecross-referencing information. Transaction records are used to implementan independent audit trail. This audit trail is not an intrinsic part ofthe user data stored by the system. All of these types of record arestored in the same database and they all have the same basic structure.Formation and use of the different types of record are describedhereinafter. First, however, a description is given of the common recordformat and in particular of the keyfield.

Simply stated, the record length used in this embodiment of theinvention is 128 bytes in length. Of this, 28 bytes are used to form thekeyfield. The keyfield can be considered as consisting of seven fieldseach four bytes in length, each four byte field being constituted bynine decimal digits stored as a four byte integer. As just noted, thekeyfield is composed of digits: ie. it is purely numeric. However, thekeyfield will in most practical computing uses need to identify alpha oralpha-numeric data. In the embodiment of the present invention, aseparate look-up table is used for at least a major part of thetranslation between alpha or alpha-numeric and numeric values. It is theparticular structure and significance of the various portions of thekeyfield from which many of the advantages of the invention accrue.

The look-up table referred to above can be considered as, and isreferred to hereinafter, as a word directory. The structure andsignificance of the various portions of the keyfield are associated withthe fact that data is stored by the system in terms of relationships.

An implementation of the present invention has been achieved startingfrom the creation of a word directory. The word directory was created byentering into a conventional computer database a vast number of commonlyused words and proper names, in the English language and in other Arabiccharacter based languages. Each word was allocated a number, the numberbeing stored in association with the word in the conventional database.It was decided that eight digits would be used to record the number,setting the maximum number of words at 10⁸. In allocating numbers to theinitially input words, the words were arranged in alphabetic sequenceand an approximately even spread of numbers throughout the range 0 to10⁸ was chosen.

In subsequent development and operation of the system, each word enteredinto the system is added to the word directory--if, of course, it is notalready present in the directory.

The numeric value associated with each word is used when forming thekeyfield or index of an item of data to be stored.

One distinguishing feature of the present invention is the use of afixed length keyfield which may appear to the user to be of variablelength. It was decided to produce an implementation of the inventionhaving fixed length records each 128 bytes in length. The first 28 bytesform the keyfield (or index) of the record. However, the invention couldbe implemented using variable length records.

The fixed length keyfield is achieved in this embodiment of theinvention by:

storing data in the keyfield in numeric form,

using the word directory,

applying data reduction to the information to form the keyfield, and by

determining the data to be used to form the keyfield.

Having taken these steps, the inventive concept then enabled furtherunique features to be realised.

Returning now to the practical detail of how such storage is achieved,attention is directed to FIGS. 1 to 3 of the accompanying drawings. Itwill be recalled that each record is 128 bytes in length and thekeyfield of the record uses the first 28 bytes. The keyfield includesdata identifying: entity type, entity, attribute. The actual value ofeach of entity type, entity and attribute is stored as a separate 128byte record. The 28 byte keyfield of a record is formed using aconcatenation of the numerical values of entity type, entity andattribute of the data which is being stored. Each numerical value is 4bytes in length and consists of 9 digits. Each 4 bytes is capable ofhaving 2×10⁹ different values (allowing for + and - values). It istherefore apparent that the selection of 4 bytes each of 9 digits toidentify entity type (and the same for entity and the same forattribute) will be sufficient for just about all realisable uses of thesystem. The numeric value of each of entity type, entity and attributeis obtained using the word directory (which is used to form the keyfieldof the actual associated records).

In the keyfield, 22 digits are allocated for storage of the alphanumeric(natural language) description of the item which is being recorded. Thisdescription is stored in numeric form, thereby achieving a fixed lengthstorage field for what appears to be a variable length item of data.FIGS. 2 and 3 indicate the 22 digits which are used for this purpose. Inthe embodiment, it was decided that the first five "words" would besufficient to record the alphanumeric description of any item. Thischoice having been made and 22 digits having been allocated for thestorage thereof, the two are brought together in the following manner:

The significance of the first five words of the description is reducedsequentially. It being recalled that 8 digits are used to define wordsuniquely in the word directory, the number of digits stored for each ofthe five words is:

    ______________________________________                                        1st word         8 digits (ie. stored in full)                                2nd word         5 digits                                                     3rd word         4 digits                                                     4th word         3 digits                                                     5th word         2 digits                                                     Total           22 digits                                                     ______________________________________                                    

That is, the data compression results in a stored value which uniquelyidentifies only the first word; with an increasing likelihood of morethan one word in the word directory matching the decompressed values ofthe second to fifth words. However, for functions based on the keyfieldsuch as alphanumeric sorting of the records stored in a database, thecorrect result is obtained sufficiently often to be acceptable as asystem constraint.

The system is capable of storing: -2×10⁶ entity types, 2×10⁹ entitiesfor each entity type, and 2×10⁹ attributes for each entity.

The keyfield, or index, of each record is thus based upon a uniquenumber made up from the concatenated numeric values of:

    entity type+entity+attribute+description

    4 bytes+4 bytes+4 bytes+22 digits.

A particularly significant enhanced aspect of the invention arises fromthe inventive recognition that a unique number is obtained byconsidering only the concatenated numeric values of entity type andattribute description:

    ie. entity type number+attribute description number

    4 bytes+4 bytes

results in a unique number. This unique number defines the purpose andcontext of the surrogate number, but does not define the surrogatenumber itself. Surrogate numbers are explained hereinafter.

FIGS. 1 to 4 of the accompanying drawings illustrate examples of recordsformed in accordance with the described embodiment of the invention.These figures show various record fields each 4 bytes in length. Amaximum of 12 such fields are shown (in FIGS. 4a and 4b) and for ease ofreference the fields have been labelled as A to L. Fields A to Gconstitute the keyfield, ie. the first 28 bytes of the record. Fields Hto L are, for relationship records, the last 20 bytes of the record.Other types of record will usually have a different data structure forthe remainder of the record after the keyfield. Within each field theindividual digits will be referred to as 1 to 9, consecutively. Thus,the first digit of the record will be referred to as A1 and the lastdigit of the record will be referred to as L9. This nomenclature ispurely for ease of reference and is of no significance to the embodimentof the invention.

FIG. 1 illustrates how the 128 byte record of a relationship recordaccording to the embodiment of the invention is formed of a 28 bytekeyfield, 15 bytes of common data and 85 bytes of user data.

FIG. 2 illustrates how the 28 byte keyfield is formed of seven fieldseach 4 bytes in length. The first field contains the entity type number.The second field contains a version number. The third and fourth fieldscontain the entity number and the attribute number, respectively. FieldsE, F and G contain different types of data, in numeric form. Each 4 bytefield consists of 9 digits.

Field A contains more than simply the entity type number. In particular,digits A1, A8 and A9 have special significance.

Digit A1 has a value of "0", "3", "5" or "8". A value of "0" indicatesthat the record contains data. A value of "3" indicates that the recorddefines a menu. Menus, however, are only a specific form of data.Another record type used in the embodiment of the invention is an Optionrecord. Option records are used for process control. An Option recordcan control processing of other records, be they eg. simple data, menurecords or other option records. To obtain the value of digit A1 for anOption record "5" is added to the value indicating data or menu. Thus avalue of "8" in digit A1 indicates an Option record for controlling menuprocessing.

It has previously been noted that field A contains data in addition tothe entity type number. In detail, the entity type number comprises onlysix digits, these being digits A2 to A7 inclusive. Digit A8 can have avalue of `0` or `9`. A value of `0` indicates an entity identifier and avalue of `9` indicates an attribute description within entity typeand/or attribute values, ie. sub-entities.

Digit A9 is used to indicate the type of indentifier in fields E, F andthe first part of G. Digit A9 can have any of values "0", "1", "2", "3","4", "5", "7" and "8". The significance of the different values is asfollows:

    ______________________________________                                        0 - name          5 - text without paragraph heads                            1 - number        6 - not used                                                2 - date          7 - uncompressed 8 ch id.                                   3 - synonym       8 - uncompressed 10 ch id.                                  4 - text with paragraph heads                                                                   9 - not used                                                ______________________________________                                    

A value of `0` signifies an entity type indentifier, ie. initial menuand entity records. This value is not used for second level menus orentity records.

The version number stored in field B contains an indication of, interalia, whether the data is public or private. In accordance with thissetting, different operators might, for example, retrieve differentvalues for the `same field` of the one record. It is also used inmultilingual applications, ie. the same data stored in differentlanguages.

As previously mentioned, field B stores the version number of therecord. If the value stored in field B (considering all 9 digits tocollectively represent a normal decimal number) is less than 000010000the contents of the entire record is open to public access. If the valuein field B is greater than or equal to 000010000 then the contents ofthe entire record are restricted to members of the user group identifiedby the value stored in field B Thus, 99,999 versions of a record can bestored for access by different user groups.

The concept of private data can be used in various ways. This is ofparticular benefit when one considers that a user's view of the storeduser data is that it is made up of multiple (but variable number) offixed length records, so that the user records appear to be of variablelength. The content of field B may be set so as to indicate that therecord contains private credit control information. Only persons havingmembership of the appropriate user group (and having the appropriatesecurity level clearance) are able to access the data contained in therecord. The distinction between public and private data can also be usedin relation to the storage of rules and procedures, ie. the equivalentof conventional programs. This is significant where part of the data inthe record is publicly available and part of the data is subject toprivate access only As an example of the use of Field B, a commerciallyprepared "application program" for use with the system might have thecontent of field B set so as to indicate public data. Any extensionswhich the user might prepare to supplement the application program wouldbe stored as private data. Thus, any subsequent release of theapplication program can safely be implemented over-writing only thatpart of the application program which is flagged as being public data.The user defined extensions will not be over-written by the updateprocess, as is the usual consequence in conventional systems. A furtherexample of the potential use of field B to differentiate between privateand public data is to use the distinction so as to identify "private"help text stored in the record. That is, one can implement a help systemvery readily which is tailored to specific users or user groups.

Fields C and D respectively contain the entity number and attributedescription number, as previously mentioned. Surrogate is a type ofrecord number. For a surrogate record, the value of field C is always 0,as is the case of the record shown in FIG. 4B. This implementationarises from the recognition that the concatenation of the entity typenumber and the attribute number provides a unique number. This uniquenumber is used with the surrogate number in order to navigate throughthe database, as explained hereinafter.

Although all records are stored in a single file, the keyfield can beconsidered as identifying different tables of data within the file aswell as, of course, the unique records. In this sense, fields a, B, Cand D may be considered as identifying a particular table, with fieldsE, F and G identifying records within a table. In fact, the recordidentifier consisting of fields E, F and G is prefixed by digit A9 whichis really part of the record identifier.

FIG. 3 illustrates how the fields E and F, together with the first fourdigits of field G are typically used to store the 22 digits formed bydata compression of certain alphanumeric information. Digit G5 is usedas a suffix to enable duplicates to be distinguished. Digits G6, G7 andG8 are used as a second suffix. This record suffix enables simplestorage of multiple relationships between the same records with the sametypes of relationship. Digit G9 is, usually set to a default is set to avalue of "9". If digit A9 has a value of "0", fields E, F and G (first 4digits) may contain a compressed name or an 18 digit uncompressed numberwith duplicates allowed, or a 22 digit number, with no duplicatesallowed. These different possibilities are dictated by option records.If digit A9 has a value of "1", fields E, F and G (first 4 digits) maystore a 9 digit number in E, or an 18 or 22 digit number as describedabove. Thus, if the record (from the user viewpoint) can be identifiedby number, a record with A9 set to `0` will contain a compressed name infields E, F and G. A record with A9 set to `1` will contain a number(18, 19 or 22 digits). If a record is identified by both a 9 digitnumber and an 18 digit number, digit A9 will be set to `0` and fields E,F and G (first part) will contain 18 digits.

Fields E, F and the first 4 digits of G only contain the previouslydescribed 22 digits of compressed alphanumeric data; if the record id is"name". ie. if digit A9 has a value of "0".

Field E for an option record is always zero.

As with field E, the content of field F depends upon the record type ascontrolled by digit A9. If digit A9 is "0", fields E and F contain thefirst eighteen digits of the twenty-two digits of the compressed name.If digit A9 has a value of "1", the date of creation of the record isstored in field E and the time of creation of the record is stored infield F. If the value of digit A9 is "7", field E stores the first fourcharacters and field F the second four characters of the uncompressedeight character id.

If a particular record is identified by number (digit A9=1) the firstfour digits of field G are "0" unless fields E and the first part of Gcontain an 18 digit name, with duplicates allowed, or a 22 digit name.If the record is identified by name, the first four digits of field Gare the last four digits of the twenty-two digit compressed name. Ifdigit A9 has a value of "7", the first four digits of field G will bezero. If digit A9 has a value of "8", the first two bytes of the fourbytes of field G are the last two bytes of the uncompressed name.

The difference between an 8 character uncompressed name and a 10character uncompressed name is that the use of 8 characters allows thepresence of duplicates whereas the use of 10 characters prohibits thepresence of duplicates. However, duplicate relationships can be createdusing digits G6 to G8. On the particular computer (IBM AS400) used inimplementing the described embodiment of the invention, many objects areidentified using a 10 character name.

The value of digit G5 normally has a value of "1" but can be used as asuffix so as to permit the presence of duplicates; when using anuncompressed 8 character ID, that is with digit A9 having a value of"7". However, if a 10 character uncompressed ID is being used, digit G5is used as part of the first two bytes of the field, in storing the lasttwo bytes of the uncompressed name.

Digits G6-G8 inclusive are used by way of a further suffix to allow thestorage of multiple relationships of the same type between the same tworecords. However, if required, digit G9 can be used as a third suffixfor storing extra data in a record. In the described embodiment, thethird suffix has not been used and digit G9 has been set to a default of"9".

Each record is one iteration of one attribute ie. contains one attributevalue--which may span several fields, with a maximum of 85 bytesavailable if it is not a relationship. In the event that it is arelationship, 65 bytes are available, plus the cross reference field.Digit G9 could be used effectively to extend the amount of data that canbe stored in one iteration of one attribute. Because a cross referenceis stored in a record in which a value of "9" is assigned to digit G9there is no need to set digit G9 to a value of `8` in the correspondingrecord. Using digits G5 to G8, 4 times 85 bytes plus 65 bytes can beused for one iteration of one record.

Further enhancements of the described embodiment of the invention mayuse values of `0` to `4` inclusive for G9 as "tags" or "overrides" onrecords. The detailed potential use of these facilities is not describedherein in detail.

In the implemented embodiment of the invention, the core program readsall records for a given keyfield, that is it includes a search for allvalues of digits G9 in the range 0 to 9 inclusive. Thus, all records arelocated at once, even though there is only usually one (G9="9"). Thisimproves processing time.

The following is a example of entry of basic entity data. Considerstorage of a telephone number. Digit a9 is set to a value of "0" and thetwenty-two digits stored in fields E, F and the first part of the Gcontain "telephone numbers" with room for a suffix to allow forduplicates should this ever be necessary. This is entry for the entityitself rather than a sub-level. Both the entity number and attributenumber are 0. That is, fields C and D are both 0.

For relationship records, the user data portion of each record stores,as a cross-reference, part of the keyfield of any related record; eg.the record containing the data with which a relationship exists. Theremaining part of the key of the related record can be determined fromthe context in which the record having the cross-reference in questionhas been accessed. Attention is directed to FIGS. 4a and 4b. Thecross-reference data is stored in fields I, J, K and L. Field I containsa copy of field A of the related record.

Fields A to G are always the keyfield and fields H to L are used tostore a cross reference, except for surrogate records, in which casefield H contains the surrogate number of the related record. Where onlyone record is being recorded, for example when the actual value of"height" is being stored rather than storing a relationship, fields H toL inclusive are used to store user data. Otherwise Field H contains thesurrogate of any records directly related to that in question. That is,the content of field H for the two records shown in FIG. 4 is the same.Indeed, it will be seen that the value contained in field H of bothrecords is in fact the entity number (ie. the content of field C) of thefirst of the two records.

The content of fields I to L depends on whether or not details of arelationship are being stored. If the data being stored does not concerna relationship, but more than one record is being recorded at the sametime, the content of field I is the same as the content of field A ofthe other record being written at the same time. In these circumstances,the contents of fields J, K and L are the same as the contents of fieldsJ, K and L of the other record, respectively.

Fields I, J, K and L contain the cross-reference; unless the record is asurrogate record or only one record is being recorded at that time. Inthese later cases, fields I to L contain part of the user data. In thecase of a surrogate record; in a relationship, the surrogate number ofboth records are stored. Fields J to L for a surrogate record containpart of the keyfield (ie., E to G) of the fourth record written, where afourth record is recorded.

Relationship Records

Consider storage of the telephone number of ABC Ltd. In order to storethis information, three records are written to the disc. These recordsare: (1) a surrogate record, (2) user of telephone number, (3) telephonenumber of user. Records 2 and 3 are effectively the inverse of eachother.

First, the surrogate record is written. The value of digit A8 is set to"9", which identifies the record as a surrogate record. The entitynumber is written in field C and the attribute number is written infield D. Thus, the keyfield can be constructed.

The value of field A is known. At a system level one chooses eitherbusiness or telephone number as the prime (that is, the one under whichthe surrogate record is written) and this selection is controlled by anoption record. In this case the value of field A identifies the entitytype client.

Field B is the version number, which simply records whether or not thedata is available for public or private access, or is one language oranother.

Field C is zero.

Field D stores compressed form of the attribute description, ie., thesurrogate of the attribute description, in accordance with the menuselection, eg. telephone number for speech. That is, the content offield D is actually the surrogate number of attribute.

The content of field E is the surrogate number of the relationship beingcreated.

The surrogate number is determined in the following manner. First ofall, a notional record keyfield is prepared in which the value of "9" isassigned to all of digits E1 to E9 inclusive. An attempt is then made toposition the pointer within the database. Read backwards one record tolocate the last surrogate record for this entity type and attributecombination. Add "1" to the value of field E, if there is no previoussurrogate the value will be "1". This then gives the new value for fieldE, that is the surrogate number to be used when the currently beingprepared record is written.

The value of field F is always zero.

The content of field G is the same as in the case of the normal entityrecord. In the case of the described implementation of the invention,the value of field G for a surrogate record is 000010019, becauseduplicate surrogates are excluded.

The number provided by the concatenation of fields A, B and D is unique.Thus, the number provided by the concatenation of fields A, B, D & E isunique. It is particularly important to note that the surrogate numberis only relevant in the context to which it applies. The content offields A, B and D define that context. That is, the key consists offields A, B, D and the surrogate number (field E). Thus, theimplementation of the invention is not limited by a nine digit surrogatenumber.

Fields H to L contain the surrogate number of the two related records,if either or both of the records are identified by surrogate. Generally,fields H to L are not used for this purpose, unless four records arebeing written simultaneously rather than three. Often, only threerecords are written. Four records are, however, written at varioustimes, for example for the preparation of menus. In this case the fourrecords are: surrogate, of attribute value, sequence number withinattribute, name of menu in which attribute appears.

The Second Record to be Written (Recordal of Relationship)

What are here described as the second and the third records caneffectively be interchanged without any significant effect.

Field A contains the entity type number (telephone number).

Field B contains the version number

Field C contains the entity number, which is the surrogate of the actualtelephone number. In writing the record, the surrogate is determined andis then used as the identifier.

Field D contains the attribute number. That is, the "telephone numberfor speech of", which is the surrogate of attribute description.

In accordance with the preceding paragraph, it is to be noted that therecordal processes is more complex than described here. For new entitiesor independent data identified by name, it involves cross checking withthe word directory and potentially writing new records to the worddirectory.

Fields E, F and the first half of G are used to store the compressedname (ie. ABC Ltd.). The second half of field G is used for suffixinformation, as described above.

Field H is the surrogate number, which is the same as the content offield H of the surrogate record. Thus, the value of field H is the samein all related records (apart from surrogate records--for which there isno need to store the number twice within that record).

Field I has the same value as field a of the related (ie. third) record.That is, the entity number--with its prefix and suffix.

Field E has digits E6 to E9 inclusive set equal to digits G6 to G9 ofthe related (ie. third) record. That is the suffix. Thus, one canidentify if any duplicates exist. The purpose is to know the key whenduplicates do exist.

Digits E2 to E5 inclusive contain the version number of the relatedrecord, if not zero. This enables determination of the value of field Bof the related record. That is, the user of group number. (Indeed, theuser group number is usually the actual value of field B).

Digits E2 to E5 inclusive may all have the value "zero", in which casethe value in field B is the user group number, if any of digits E2 to E5has a non-zero value, then digits E2 to E5 contain the value of field Bof the related record. That is, digits B6 to B9 inclusive of the relatedrecord are respectively equal to the values of digits E2 to E5, with thevalue of digits B1 to B5 inclusive being zero.

Third Record of the Relationship

The third record is written in the same way as the second record; but inthe inverse.

As can be seen from the second and third records, one can identify thesurrogate record, but one only has the partial key of the related recordin the cross reference. However, because of the context, it is possibleto determine the full keyfield of the second (related) record.

In order to achieve this, one cannot use the surrogate, because fieldsE, F and the first half of field G of the second record are not recordedin the cross reference (fields I to L of the third record).

However, one has arrived at or identified the third record (or higherlevel attribute value surrogate) via the entity record of the relatedrecord, and that contains the missing information. An array in the coreprogram records the content of fields E, F and the first part of field Gof the entity record. The core program keeps the full history (usingthis data storage) for up to 99 levels. This enables the backtrackingfacility within the system. More than 99 levels can be tracked by usinga temporary storage file.

Fields A to G use 28 bytes. A further 65 bytes are used for user data(as we are concerned with storage of a relationship). As usual, 15 bytesare allocated to common data (which includes a transaction number--nouse being made of the AS400 relative record number facility). The 4bytes of field H contain surrogate number. The remaining 16 bytes, whichcontain cross reference information, are stored in fields I to L.

The advantage of the described arrangement is that if ABC Ltd. changesits name, the key of the third record will not change for telephonenumber. The second record does, of course, change.

Thus, the system of the present invention provides multi level datarecordal, in contrast to systems where it is necessary to delete allsub-levels and re-enter all of the associated data. In the presentinvention, it is possible simply to "move" the sub-level which is nolonger valid; and all subsequent-levels also move "automatically";because the surrogate number does not change. The attribute value ismoved from one entity type and attribute description to another. Incontrast, the surrogate number would change if the no longer validsub-level were to be rewritten rather than "moved".

The arrangement described in the preceding paragraph does not work ifone is changing the whole of the value formed from the concatenation ofentity type number, attribute number and version number; because withthat unique number the surrogate number being used may already have beenissued.

The gain in performance achieved by the present invention is that thesystem only reads very few records at any one time. This is verybeneficial to user productivity. Overall, the system uses only simple,small steps--in contrast to most conventional systems in which thenumber of processing steps to be executed can take a sufficient lengthof time for the user to lose their concentration. In terms of virtualsystem computer hardware, the present invention will use very littlepaging.

Option Records

Reference has been made to the rules and procedures conventionallyencoded in computer programs being in the present invention stored withthe data. This has been conveniently implemented using records in whichthe 4 byte entity type number in the keyfield is left blank. For ease ofreference, such records may be referred to as Option records. Optionrecords can be used to regulate the path of data retrieval. That is, thecontent of an Option record can be used to determine which records areaccessed in response to a particular input. This can be used toimplement "relative" security within the system. The term "relative"does not here imply any potential compromise of security, in fact, theopposite. Relative security is often desired of, but rarely achieved in,conventional systems. It is the ability to assign, especiallydynamically, different levels of security to a single user of thesystem. This can be a security level which varies in accordance with theapplication being used. Of course, assigning one digit within the Optionrecord enables 10 security levels to be allocated for any particularfunction. Many security codes are stored to control security over manydifferent functions.

Option records are used to define the valid record identifiers for usein a particular table. Option records are used to determine the way inwhich data is displayed. Option records are used to control theallocation of surrogate numbers. Option records are used to control theinterval number applied between subsequent records. That is, theinterval for surrogate records is set to "1" whereas the interval forparagraph headers or short names for text is set "10000".

Text is identified by surrogate, then by paragraph header sequencenumber or paragraph numbers in document and then by the first fivewords. Text is a record type (structure) rather than function. Text isstored in the sequence identified by the surrogate numbers allocatedupon its initial input. Consequently, one can change either of thesequence number or the words--while retaining unique identification ofthe text.

It should be noted that text is integrated into the database and is notprocessed separately.

Reference numbers can be used to set priorities for batch processing.

All of these types of facility are controlled by option records.

Keyfield of an Option Record

A default may be set for entity type, entity and attribute. The purposeis to reduce the number of option records required. However, optionrecords for addresses have a default set for entity type, but arespecific for attribute.

A menu record is used to specify which default and which option recordsare to be used.

The value of digit A1 with the addition of "5" identifies whether theoption record is concerned with data or with a menu.

Digits A2 to A7 inclusive store the entity type number, but the defaultsetting is zero.

Digit A8 is not used in the described implementation of the inventionand is therefore set either to "zero" or "9". It is always the same asin the data records for which it is used.

Digit A9 is set to the value of "1", because the record is identified byan option number.

Field B contains the version number. It can be zero and is determined byan option record. Although a general facility, the distinction betweenprivate and public data is particularly useful for option records, sinceit enables different rules to be encoded for different users (forexample within a single "cell" of a spread sheet). The facility can alsobe used to identify a user version of an option record and, if found, tooverride the public version thereof with the user version.

Field C contains the entity number, and for an option record this isnormally zero.

Field D contains the attribute number. For an option record, the valueof field D is zero unless special characteristics are required for thetable, in which case these will usually relate to a particularattribute. For example, address identified by post code within an optionrecord for address.

The value of fields E and F are both zero.

Field G contains the number of the option record, using digits G1 to G5inclusive. Duplicate option records are not permitted in the describedimplementation of the invention, thus digits G6 to 8 are allocated therespective values of "001". Digit G9 is allocated the value of "9".

The option record contains no cross reference field. Thus, 85 bytes areavailable for the storage of user data. This can be extended to 170bytes by writing a second record, in which the value of digit G9 is setto "8" rather than "9". Thus, this is one example where a significantchange may be made without the need to restructure the whole of the database.

Option records can be associated with specific applications. For suchuse, an application number is stored as part of the keyfield,conveniently in field C.

The option record can be considered as a filter which controls access tothe table to which the option record applies.

Use of Option Records

Option records can be used to restrict access to view only, that isexclude edit. Thus, if security access is required to implement changesof address, a alternative option record would be provided to enable thatfacility.

Option records can be used to identify an exit program. Additionally,option records can be used to control reading, by the system of thepresent invention, of data files which were not recorded using thesystem of the invention--using exit programs.

Option records may be used to implement dynamic changes to user securitylevels.

Within the common data area of each record, three characters areallocated as a record identifier. Each of these three characters may beallocated a character in the range A to N or "0" to 9, inclusive orcertain special characters. That is, the total number of permissibleidentifiers is 40³. However, this does not restrict the total number ofidentifiers to sixty four thousand record types; because the identifierrelates to the specific structure of the record, eg. name. These arerecord types, ie., attribute value types. The total number ofidentifiers under the heading entity is sixty four thousand. Similarly,the total number of possible entries under the heading attribute issixty-four thousand each of which can be itterated by one billion.

It is an option record which defines the three character record typecode when new records are added.

Generally, one retrieves an entire table. That is, one receives allrecords (subject to security) rather than just those with the samerecord type, as defined by an option record.

Many option records may exist for a single table within the data base.Thus, different record types are permissible within a single table.

Within an option record, there is one byte which controls the length ofname stored in records which are added to the data base using thatoption record. Thus, whereas conventionally there is a very highoverhead in changing between or allowing for, say, a twenty fivecharacter name length rather than, say, twenty four character namelength; with the present invention one simply uses a different optionrecord when writing the data to the disc. The name length is part of thecommon data area of the record written.

Option records may also be used to implement commitment control. In thisrespect, option records can be used to set transaction boundaries.Specifically, one option record may store the start sequence for thecommitment control and another option record may control completion ofthe commitment control process. In a conventional arrangement, it is notpossible to insert extra steps within a commitment control procedurewithout careful consideration of the consequences upon the proceduralsteps on either side of the point of insertion. However, considerationof such consequences can be avoided using the arrangement of the presentinvention.

Option records are used to control the number of records to be retrievedsimultaneously. One character within the option record is used for thispurpose and consequently the number of records to be displayed isreadily controlled in the range 0 to 9 inclusive, where 0 can be highrather than low.

Option records can be used to control paragraph headers and paragraphtext. That is, they can be used to display long or short names (anylength or two hundred and fifty six characters).

An option record can be used to define whether or not a long name willbe present.

Generally speaking, one line on the display screen is equivalent to onerecord. In this arrangement, a long name is one line per record and thename could be, say, ten thousand lines long.

Option records can be used to control processing, such as to prompt thedisplay a message to the user when access to requested information isdenied. Similarly, option records are used to control process flow, forexample to return a user to a menu or to the main related entity record.

The processing control which can be achieved with option records isextremely flexible. For example, it is very straightforward to implementoption records which upon input of a telephone number identify the ownerof that telephone number and switch processing to a menu so that otherattributes of the owner can be selected for inspection.

An option record may contain the number of the next option record to beused to access another table.

Menu records also contain option record numbers. This number is added tothe number in the option record being used to give the actual numberused in processing.

Transaction processing can be used to implement Boolean logic. Forexample, consider a table comprising five entries. If the displaycontrol character is set to the value of "1", the processing searchesfor a single record. The option record controlling the transactionprocessing then stipulates what action is to be taken if that record isor is not found. Thus, option records can be set as decision makerswithin the database.

Consider as an example, price updating in an ordering system. One wishesto implement time dependent pricing. This can be achieved usingkeyfields which are date and time stamped, ie., field E is set to dateand field F is set to time. The 9th digit of the date is used to checkthat the date is correctly stored. The first digit of field F is used tocheck the accuracy of recordal of time. The option record is set to readthe table backwards until a record is located which has a keyfield whichis valid when compared with the current date and time. Using thissystem, one can readily implement a price expiry when a new price isinput. That is, one reads backwards in time, effectively, to find theprice recorded with the most recent date and time before the date of thetransaction. This is to be contrasted with a conventional system, inwhich the price would typically be given a validity date range, from onedate until another date. The difficulty with such a system is whatprocessing should apply if the date of the transaction falls outside thedate range of the prices recorded in the database. However, in theimplementation according to the present invention, one does need toencode an expiry date, because the price becomes redundant and invalidas soon as a new price is entered. All further transactions are thenimmediately conducted at the new price. Equally, one can "encode"discounts, market sectors, etc.

The important thing to note is that all of this "processing" is encodedwithin and takes place within the database itself. The processing is notstored in nor controlled by a "program" in the conventional sense. Onesimply writes an extra record into the database specifying what newdiscount is to apply, for example. This avoids one of the majordisadvantages of conventional systems which would typically require aduplicate database to be established (encoding the new price -) with theconsequential difficulties of maintaining the two copies of the databaseconsistent with each other until such time that the duplicate replacesthe original to make the price change effective.

Option records are used to control processing in a manner analogous toerror trapping. That is, the response of the system when no records arefound or more than one record is found in response to a user inquiry iscontrolled by an option record. Further processing can also becontrolled by the option record. For example, if the selected data itemis not retrieved in response to an inquiry, the option record maypresent the user with the option of actually adding that data to thedata base. Similarly, the option record can be set to advanceautomatically, or not, to the next processing stage when a requesteditem of data has been retrieved. For example, if one enters an address,there may be little benefit in advising the user that the address hasactually been located within the system. It may be more beneficial toprogress to the next stage of processing which is to retrieve theassociated name and present that name to the user for checking. In fact,this concept of using option records together with menus leads to theconcept of the creation of procedures. Procedures execute a number ofsequential processing steps, dependent upon the requirements of eachstep being individually fulfilled. This can be considered as a form of"one entry" menus which have an auto advance between each of them.

The common data area format is as for other records.

As previously mentioned, within the option record one digit controls thenumber of records to be displayed or accessed. If this digit has a valueof "1" and the option record specifies auto advance, the option recordhas then become a procedure.

Procedures can be used to force data entry. For example, if one is usinga menu which accesses one record at a time, an option record can be usedto check for the existence of the specified record and in the event thatit is not located, it prompts for the value to be entered.

Default option records have the value for each of the fields associatedwith entity type, entity and attribute set to zero. However, it ispossible to implement different levels of default, in which case thevalues within these fields may not all be zero. For example, if theattribute is to be non-blank, the attribute number would not be zero.

Field L contains the option record number.

Menus

In order to access records in a database in accordance with the presentinvention, it is necessary to specify the entity type number and theattribute number. The actual processing which takes place will ofteninvolve the use of an option record, including the use of an optionrecord number which is accessed by a menu record. The menu record mayalso be coded to indicate if the option is to be taken as a default atthe entity or attribute level. Digit A1 of the option record has a valueof "5" or higher. That is, to form an option record "5" is added to thevalue of digit A1 of the type of record being accessed or controlled bythe option record. The record type code, the three character record typecode referred to above, begins with the letter "F". This three characterrecord type code is part of the common data within each record withinthe database. The letter "F" as the first character of the codeindicates which data structure is to be used.

The value of field C will be the application number forapplication-related menu records. The value of field C will be zero foran application independent menu. The corresponding option recordsdetermine, of what type the menu record should be.

Menu records are identified by digit A1 having a value of "2" or "3". Avalue of "2" signifies that the menu record is for controlling routes toa current position, whereas a value of "3" signifies that the menu isfor controlling routes after selecting or finding a record.

The keyfield or a menu record is constructed in the same manner asgenerally described previously for other record types, apart from thevalue of digit A1. For example, the name of a menu is encoded as entitytype.

The common data area of a menu record has the same format as any normalrecord type.

The three character record type code located in the common data area ofthe record has a format in which the first character is part of the exitprogram name. For example, in relation to exit programs, the firstcharacter of the three character record type code is the same as thefourth character of the program name. Using this nomenclature, all exitprograms have a name in which the first character is E (signifyingErros--the name given to the system of the present invention) followedby a two character version number. Thus, if one wishes to add adifferent record data structure, one does not have to change the mainbody of the core program, one simply appends an exit program or anexisting one entered.

Transactions Summary

A transaction record is written for each transaction. The transactionrecord has the entity type number set to zero. Digit A1 has a value of"0", because it is a data record. Digit A8 is "0", Digit A9 has a valueof "1", because it is a number.

Transaction numbers are generated by setting all of digits of field D tothe value of "9" and then reading backwards until the last transactionnumber is located. A value of "1" is then added to the last transactionnumber so as to form the new transaction number. This process is thesame as that described above in relation to the generation of surrogatenumbers.

Typically, a transaction record might store details such as: usernumber, company, department, date and time, application number. Use ofthe transaction summary enables ready implementation of an "undo"facility.

Navigation and the Browse Facility

One can browse or navigate through the database, moving between entitytype, entity and attribute. Using the keyfield, and particularly fieldsI to L of an attribute, one can identify the entity type and identifier.The value stored in field I identifies the entity type. That is, itidentifies the value of field A of the related record, except that digitA8 is set to "0" in the related record, compared with having a value of"9" in the cross reference field of the attribute. The version number ofthe related record can be determined from an inspection of digits I2 toI5. If all of these digits have a value of "0", then field B of therelated record contains the user number. However, if digits E2 to E5 arenot all zero, then digits B1 to B5 of the related record are zero anddigits B6 to B9 have, respectively, the values of digits E2 to E5.

Thus, the values of fields A and B of the related record are determined.

The entity number and attribute number of the related record, that isthe value of fields C and D, are zero. This is the case because one isconsidering the starting point (or top) of the data tree.

Generally, the values of fields E and F remain unchanged between theattribute record and the related record. However, the last four digitsof field J give the value of the last four digits of field G of therelated record--although this usually means no change.

Thus, the entire keyfield of the related record has been constructedfrom the context. Moreover, this process of constructing the keyfield ofa related record can be used in either the forward or backwarddirection.

Much can be achieved simply by using attributes. For example, considerthe difference between "messages", which simply lists all messages, and"messages received", which lists only those messages sent to theparticular user in question.

Context

One can only access data in a particular context, that is as a result ofa specific enquiry. Thus, when one views the content of a particularrecord, the context of the information stored in the record is known.Moveover, the information is never out of context.

Application Structure

An application can be considered as a filter to the overall database.For example, one may wish to assign a subset of the database to invoiceprocessing. An application allows access to the records, procedures,etc. which are relevant to invoice processing.

An application is an entity type. This is illustrated in the semanticnetworks shown in FIGS. 5a and 5b of the accompanying drawings.

A menu filters the attributes which are available for each entity typein an application. Thus, one only needs to store the attributes once.

If one wishes to access all attributes, one should use an applicationindependent menu (ie. no filter). This is to be contrasted with theconventional approach of "coding" the required attributes within anapplication program. The overall concept is to take the processing awayfrom "encoded programs" and put the processing as part of the actualdata storage.

An example of an application is indicated by the semantic network shownin FIG. 6 of the accompanying drawings. The application is "security" inwhich the entity type "application" is available. An attribute of theentity type is "authorised to". Again this is to be contrasted with theconventional arrangement in which the security is encoded within eachapplication.

The operator can only use applications which he has access to because hehas the appropriate attributes as a user, ie., data and rules areconsidered as being essentially the same. Thus, a user navigates throughrules in the same way that he navigates through data.

Also available within the system is a start-up application. Thisapplication enables one to change the start-up procedure withoutchanging the core program. An attribute of "applications authorised"lists those applications which are available to be run. After havinglisted the applications, one commences navigation through the database.

Using this technique, nested applications can readily be implemented andsecurity checks can be included at each level.

Similarly, it is possible to encode a new document creation application.Such an application may include procedures to solicit data such asauthor name etc.

Files

Only one file (or database) is required. The keyfield of each record isunique and therefore there is no need to establish more than one file.However, for operational reasons, it may be desirable to split thedatabase into several different files. For example, one may wish to makesuch a split in order to separate user data from public data. Similarly,one may wish to make such a split in order to separate applicationdefinition data from the data to which the application applies. Thus,for distribution purposes one can change the application withoutoverwriting the user data to which the application applies. Thus, thirdparty "software" can be implemented using the system of the presentinvention in a manner directly analogous to that of conventionalsystems.

Another set of circumstances in which it may be desirable to establishmore than one file is where one wishes to purge or archive data. Thiscould be particularly applicable where the data comprises messages ordocuments.

Another instance where it may be desirable to establish more than onefile is where the user data is of specific interest or type, for examplea database concerning Fine Art.

A further example where one may wish to establish more than one file iswhere configuration data is to be stored, concerning for example thelist of computer terminals and the operators who may access the computervia those terminals etc.

In the embodiment of the invention described herein, a total of sevendifferent files have been used. These files contain:

(1) The word directory

(2) The user data

(3) Private data

(4) Public application data (eg. a list of countries)

(5) Public application data; specific (art history) rather than general

(6) Application definitions (including the Thesaurus)

(7) Configuration file

(8) Message text file

(9) Transaction Summary

The only other part of the system is the core program, which containsless than one megabyte of instructions.

Thesaurus

An important feature is the thesaurus which is distinct from the worddirectory file. The thesaurus forms part of the main database and"thesaurus" is an entity type. The thesaurus includes phrases (in anylanguage), in contrast to the word directory which only includesindividual words. Every term used to define the structure of anapplication of the database must first be entered into the thesaurus. Aspart of that process, a surrogate number is allocated.

Every word used as an identifier throughout the system must be includedin the word directory. However, the thesaurus does not necessarilyinclude the whole of the word directory.

The thesaurus is the central repository of terms. Every term required todefine the structure of the database must be included in the thesaurus.Thus, to add a new entity type, it is first necessary to put the entitytype name into the thesaurus. This is illustrated by the semanticnetwork shown in FIG. 7 of the accompanying drawings.

Subsequently, one can access the thesaurus to discover whether theparticular name is an entity type.

The thesaurus forces the use of standard words and phrases. Thus, digitG5 may allow duplicates, but they will each have a different surrogatenumber.

As an example of the contrast between the word directory and thethesaurus; the word directory stores the word "red" only once. Thethesaurus may have the word "red" stored several times, for example as acolour as a first occurrence and in elation to politics as a secondexample. One can thus interrogate the thesaurus to retrieve a list ofthe potential meanings of the word "red".

The thesaurus allows one to find out what other knowledge there is inthe system relating to the word "red".

The thesaurus is not used in all transaction processing. For example,consider the attribute description "is a". Although the phrase "is a"may be retrieved from the thesaurus; the navigation through the databaseis immediate, because the processing is from that initial thesauruslookup through entity type to entity. The expression "thesaurus" is anentity type and "entity type" is in the thesaurus.

Multiple User Groups

It is possible to accommodate multiple user groups on one system. Onemay create a separate file for private user data and new user groups maybe added with ease. In searching records, the core program always looksfor the user group number.

A member of one user group can send messages to a member of another usergroup.

Using the user group number as part of the keyfield, inherent securityis achieved. Cross access between data belonging to different usergroups cannot be achieved. Thus, it is very straightforward to preventaccess to other group data or vice versa, because this is part of thekeyfield of the relevant records.

Core Program Structure

(1) "Request processing" program. Main operation interface.

(2) Main program which accesses database. Overall controlling program.This does all calling of other programs.

(3) "Database update" program, which also provides text processinginterface with the operator.

(4) Exit programs: ERROS or user defined, e.g.maths/printing/differential screen display.

Exit Programs

Exit programs conveniently adopt the following naming convention, partof which has been referred to above. The first character of the exitprogram is always the character E (signifying ERROS--the name used inrelation to the system of the present invention). The second characterspecifies the release number of the program. The third characteridentifies the modification of the program. The fourth character is thefirst character of the three character record identifier of the type ofrecord being processed by this exit program. The fifth character of theexit program name specifies the version number of the program. Thischaracter can be assigned any character in the range A to Z inclusive orany digit in the range 0 to 9 inclusive or some special characters. Thesix to tenth characters of the exit program name identify the user groupnumber. This may be zero, indicating that is available to any usergroup.

User specific option records identify which version of the programshould be used, and whether or not it is their own user group or apublicly available version of the program which is called by the optionrecord, for a particular user group.

This naming convention makes it easy to avoid confusing differentversions of the same program and every node can define for each usergroup which version of the program is to be used. This may includesimple choices such as different screen painting or could incorporatemore esoteric requirements.

The exit programs may be contained in a program library. The name givento the program library should preferably use the following namingconvention. The first character of the library name is the character E.The second character identifies the release number. The third characteridentifies the modification. The fourth character identifies themodification level. This naming convention is particularly useful forcertain computer hardware types. For example, the IBM AS400 machine usesa library list in which each job has an associated library list.

A major advantage of the system according to the present invention isthe small amount of memory resident code (core program) which isrequired. Indeed, this advantage arises from the fact that the coreprogram is sufficiently small that it can always be resident in memoryon a machine such as the IBM AS400. consequently, the requirement forpaging of programs is largely avoided.

Help System

An on-line help facility is readily implemented, by specifying the helpfile as an entity type. Thus, with an entity type of "help" an entity of"business" and an attribute of "telephone number" one forms a surrogaterecord of the relationship between "business" and "telephone number",which record contains a description of that combination. Since theentity type becomes the entity when considered in a different context,very little processing by the core program is required in order toimplement the help system. The help text is stored for a particularentity type rather than for each individual entity.

Help menus need not be application specific.

Use of the help system is thus the same as access to any other type ofdata recorded on the database. One navigates through the relevantrecords in the same manner as described previously.

The above described on-line help system completely avoids theconventional approach which is to establish a fully separate on-linehelp data base and associated set of programs.

The specific description of the invention given above was set out inBritish patent application number 9320404.8 filed on 4th Oct. 1993.Since that date various modifications have been implemented and tested.Of note among such modifications has been an embodiment in which theabove described Option Records have been functionally merged with theabove described Menu Records. The principles remain unchanged, but asingle record type is used in place of the previous two. Anothermodification has been the use of the Surrogate Records to store furtherdata. As described above, the Surrogate Records exist to record thesurrogate number, which can be considered as the primary internalidentifier of the system. Additional data can however be stored in theSurrogate Record. Such additional data should relate to the content ofthe records to which the surrogate relates and, preferably, be displayedwith the content of the related records.

Since October 1993 the initial implementation of the invention has beenindependently test. Such tests have calculated the improvement inefficiency of developing a new application (ie. suite of "programs") asbeing as much as 500% or more, compared with conventional programmingtechniques. The tests have also reported improvements of the same orderof magnitude in operating speeds of the completed application.

I claim:
 1. A method of storing and retrieving data using an electronicfile, the method comprising the steps of:providing a list ofalphanumeric descriptions to each of which is assigned a number;entering data to be stored in terms of an entity type and an attributeof the entity type; retrieving from the said list the respective numbersfor the entity type and for the attribute; forming a keyfield byconcatenating the number identifying the entity type with the numberidentifying the attribute of the entity type; and writing a record tothe electronic file, the record comprising the said keyfield and a datapart.
 2. A method as claimed in claim 1, wherein the step of enteringdata to be stored includes entering the data to be stored in terms of anentity of the said entity type in addition to the entity type and theattribute of the entity type; wherein the step of retrieving from thesaid list includes retrieving the respective number for the entity; andwherein the step of forming the keyfield comprises concatenating thethree numbers retrieved from the said list.
 3. A method as claimed inclaim 1, wherein part of the keyfield indicates the structure of thedata part of the record.
 4. A method as claimed in claim 1, wherein someof the records store data and other of the records store details of therelationships between data.
 5. A method as claimed in claim 1, whereinsome of the records store data and other of the records control dataprocessing.
 6. A method as claimed in claim 1, wherein the storage of anitem of information involves the formation and recordal of three recordseach of which contains identifiers enabling the other two records to beidentified.
 7. A method as claimed in claim 1, wherein the numericvalues of the said provided list is calculated for alphanumericdescriptions with reducing significance being assigned to words in analphanumeric description after the first word in the list.
 8. A datastorage and retrieval system using an electronic file,comprising:storage means containing a list of alphanumeric descriptionsto each of which is assigned a number; means for entering data to bestored in terms of an entity type and an attribute of the entity types;means for retrieving from the said list the respective numbers for theentity type and for the attribute; means for forming a keyfield byconcatenating the number identifying the entity type with the numberidentifying the attribute of the entity type; and means for writing arecord to the electronic file, the record comprising the said keyfieldand a data part.